Fix(reasoning_gym/games/countdown): Resolve SymPy parsing conflict for 10+ input numbers #514

asaberui1 · 2025-11-30T13:43:58Z

Hi!

Thanks for your work! I really appreciate your effort in building a unified benchmark. This PR addresses a bug in the reasoning_gym/games/countdown module where the SymPy symbol replacement logic fails when the number of input numbers (num_terms) is 10 or greater.

🐛 Problem

The current implementation for generating and validating the arithmetic expression breaks down when there are more than 9 unique symbols (representing the input numbers). This causes the expression string to contain incorrect, phantom numbers, leading to an AssertionError during the final answer validation.

The error occurs because the original symbol naming scheme creates ambiguity when the indices transition from single to double digits.

🐞 Root Cause

The original symbol generation used a format like syms = symbols(f"x:{num_terms}"), resulting in symbols like x0, x1, ..., x9, x10, x11, ....

During the subsequent string replacement to substitute the symbol names (x10, x11, etc.) with the actual numbers, the string "10" within "x10" was mistakenly matched and replaced before the full "x10" symbol, causing a collision.

Example Trace:
In an example with 15 numbers:

q="Calculate 4109 using all of these numbers: 288, 258, 234, 214, 292, 221, 283, 249, 214, 251, 251, 216, 261, 289, 253.

Symbols x10 through x14 were incorrectly processed, leading to the unexpected presence of numbers like -2510, -2511, etc. (which are not in the available set) in the final expression string:

a="-2510 - 2511 + 2512 + 2513 - 2514 + 253*(214 + 251 - 251 + 258 - 234 - 249 + 288 - 261) + 283"

This expression is obviously incorrect. The symbol generation is corrected to explicitly include an underscore separator, ensuring that each symbol name is unique and unambiguous during string replacement, regardless of the number of terms.

✅ Fix

The line is updated from:

# Old logic:
# syms = symbols(f"x:{num_terms}")

to

# New logic:
syms = symbols(f"x_{0}:{num_terms}")

The following logic for expr_str generation has been updated accordingly to handle the new x_N format.

With this fix, the expression is generated correctly using only the valid input numbers:

a="-221 - 214 + 216 + 289 - 292 + 253*(214 + 251 - 251 + 258 - 234 - 249 + 288 - 261) + 283"

And the answer is 4109, which is correct.

Updated symbol naming and added safe replacement for expressions.

Modified return statement to include the modified expression string.

olliestanley · 2025-12-08T18:30:40Z

Thanks for submitting this fix. Could you please also add a unit test that covers this scenario?

Add test for CountdownDataset with more than 10 numbers

asaberui1 · 2025-12-09T02:46:13Z

I am glad to help adding a unit test. With the test_countdown_more_numbers function, if you run pytest with old versions, it should raise error:

# pytest tests/test_countdown.py 
.......F..                                                                                   [100%]
============================================= FAILURES =============================================
___________________________________ test_countdown_more_numbers ____________________________________

    def test_countdown_more_numbers():
        """Test when min_numbers exceed 10"""
        dataset = CountdownDataset(CountdownConfig(min_numbers=11, max_numbers=11, shuffle=False, size=5, seed=42))  # Set 11 engaged numbers for testing
    
        for item in dataset:
>           assert item["metadata"]["target"] == int(eval(item["metadata"]["expression"]))
E           AssertionError: assert 269 == 496
E            +  where 496 = int(496)
E            +    where 496 = eval('73 - 32 + 320 + 64 + 71 + 84 + 88 - 44 + 33 - 98 - 63')

tests/test_countdown.py:121: AssertionError
===================================== short test summary info ======================================
FAILED tests/test_countdown.py::test_countdown_more_numbers - AssertionError: assert 269 == 496
1 failed, 9 passed in 1.57s

With the newest version, pytest just simply passed:

# pytest tests/test_countdown.py 
..........                                                                                   [100%]
10 passed in 1.69s

Refactor CountdownDataset initialization for readability.

asaberui1 · 2025-12-10T01:25:35Z

I'm sorry that I didn't do pre-commit checks. I have fixed those errors and now it should work.

# pre-commit run --all-files
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...............................................................Passed
check for added large files..............................................Passed
black....................................................................Passed
isort (python)...........................................................Passed

It works on my computer.

Refactor expression generation and substitution logic

85629bd

Updated symbol naming and added safe replacement for expressions.

asaberui1 marked this pull request as draft November 30, 2025 14:11

Add expr_str to return values in countdown.py

7b6283e

Modified return statement to include the modified expression string.

asaberui1 marked this pull request as ready for review November 30, 2025 14:13

Implement test for min_numbers exceeding 10

c24c439

Add test for CountdownDataset with more than 10 numbers

asaberui1 added 2 commits December 10, 2025 09:21

Remove trailing-whitespace

8a40240

Improve readability of CountdownDataset initialization

05e279c

Refactor CountdownDataset initialization for readability.

olliestanley approved these changes Dec 15, 2025

View reviewed changes

olliestanley merged commit 7d68a6c into open-thought:main Dec 15, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix(reasoning_gym/games/countdown): Resolve SymPy parsing conflict for 10+ input numbers #514

Fix(reasoning_gym/games/countdown): Resolve SymPy parsing conflict for 10+ input numbers #514

Uh oh!

asaberui1 commented Nov 30, 2025 •

edited

Loading

Uh oh!

olliestanley commented Dec 8, 2025

Uh oh!

asaberui1 commented Dec 9, 2025

Uh oh!

asaberui1 commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix(reasoning_gym/games/countdown): Resolve SymPy parsing conflict for 10+ input numbers #514

Fix(reasoning_gym/games/countdown): Resolve SymPy parsing conflict for 10+ input numbers #514

Uh oh!

Conversation

asaberui1 commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐛 Problem

🐞 Root Cause

✅ Fix

Uh oh!

olliestanley commented Dec 8, 2025

Uh oh!

asaberui1 commented Dec 9, 2025

Uh oh!

asaberui1 commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asaberui1 commented Nov 30, 2025 •

edited

Loading